Validation of a series of walking and stepping tests to predict maximal oxygen consumption in adults aged 18–79 years

Introduction Field tests to estimate maximal oxygen consumption (VO2max) are an alternative to traditional exercise testing methods. Published field tests and their accompanying estimation equations account for up to 80% of the variance in VO2max with an error rate of ~4.5 ml.kg-1.min-1. These tests are limited to very specific age-range populations. The purpose of this study was to create and validate a series of easily administered walking and stepping field equations to predict VO2max across a range of healthy 18-79-year-old adults. Methods One-hundred-fifty-seven adults completed a graded maximal exercise test to assess VO2max. Five separate walking and three separate stepping tests of varying durations, number of stages, and intensities were completed. VO2max estimation equations were created using hierarchal multiple regression. Covariates including age, sex, body mass, resting heart rate, distance walked, gait speed, stepping cadence, and recovery heart rate were entered into each model using a stepwise approach. Each full model created had the same base model consisting of age, sex, and body mass. Validity of each model was assessed using a Jackknife cross-validation analysis, and percent bias and root mean square error (RMSE) were calculated. Results Base models accounted for ~72% of the total variance of VO2max. Full model variance ranged from ~79–83% and bias was minimal (<±1.0%) across models. RMSE for all models were approximately 4.5 ml.kg-1.min-1. Stepping tests performed better than walking tests by explaining ~2.5% more of the variance and displayed smaller RMSE. Conclusion All eight models accounted for a large percentage of VO2max variance (~81%) with a RMSE of ~4.5 ml.kg-1.min-1. The variance and level of error of models examined highlight good group mean prediction with greater error expected at the individual level. All the models perform similarly across a broad age range, highlighting flexibility in application of these tests to a more general population.


Introduction
Field tests to estimate maximal oxygen consumption (VO 2max ) are an alternative to traditional exercise testing methods. Published field tests and their accompanying estimation equations account for up to 80% of the variance in VO 2max with an error rate of~4.5 ml . kg -1. min -1 . These tests are limited to very specific age-range populations. The purpose of this study was to create and validate a series of easily administered walking and stepping field equations to predict VO 2max across a range of healthy 18-79-year-old adults.

Methods
One-hundred-fifty-seven adults completed a graded maximal exercise test to assess VO 2max . Five separate walking and three separate stepping tests of varying durations, number of stages, and intensities were completed. VO 2max estimation equations were created using hierarchal multiple regression. Covariates including age, sex, body mass, resting heart rate, distance walked, gait speed, stepping cadence, and recovery heart rate were entered into each model using a stepwise approach. Each full model created had the same base model consisting of age, sex, and body mass. Validity of each model was assessed using a Jackknife cross-validation analysis, and percent bias and root mean square error (RMSE) were calculated.

Results
Base models accounted for~72% of the total variance of VO 2max . Full model variance ranged from~79-83% and bias was minimal (<±1.0%) across models. RMSE for all models were approximately 4.5 ml . kg -1. min -1 . Stepping tests performed better than walking tests by explaining~2.5% more of the variance and displayed smaller RMSE.

Introduction
Maximal oxygen consumption (VO 2max ) is a key indicator of health and cardiorespiratory fitness [1] and is considered a "clinical vital sign" and strong predictor of mortality [2]. The traditional, gold standard method to assess VO 2max is open circuit spirometry in conjunction with a graded exercise test (GXT) to volitional fatigue. Open circuit spirometry, a method of indirect calorimetry, requires the use of a computerized metabolic measurement system to analyze expired gasses to determine oxygen utilization [1]. A standard GXT protocol, typically performed on a treadmill or cycle ergometer, incrementally increases exercise intensity until the participant achieves VO 2max [3]. Despite valuable information obtained from VO 2max testing, it is not always feasible in certain settings. The cost of the equipment required to complete such tests is high, and testing requires trained professionals, often making this form of testing inaccessible to the general public. Economic factors aside, VO 2max testing is not always a safe option for certain populations [1], such as the elderly who are at a higher risk for falling or those with an increased risk of experiencing an adverse cardiac event during vigorous exercise. Submaximal VO 2 testing to predict VO 2max is an alternative to traditional maximal testing without requiring the participant to work to a maximal intensity [1]. Two popular submaximal modalities are the treadmill and cycle ergometer [4][5][6][7][8]. Similar to maximal exercise testing, the cost associated with submaximal VO 2 testing can be high and requires specialized equipment and trained personnel. Submaximal field testing, which involves simple equipment and measures (e.g. distance wheel, heart rate monitor), is another alternative to maximal exercise testing. Traditionally, these alternative, low cost options include over-ground walking/running [9][10][11] or stepping tests [7,12,13]. These tests can provide a safe testing alternative for high risk populations and can be easily administered in the field or clinical setting with little expense to estimate VO 2max .
Ease of delivery and physical burden of a test are only two components to consider when selecting a field test to estimate VO 2max . How well a field test prediction equation estimates VO 2max , as determined through methodological validation research, and what population(s) the test is designed for are also important factors to consider. Explained variance and error of the estimate reported in the literature fluctuates among submaximal field tests predicting VO 2max , with the highest performing prediction equations reporting in the region of 80% of the shared variance and an error of approximately 4.5 ml . kg -1. min -1 [8,10]. Unfortunately, a limitation within the current body of literature is a lack of consistency in validation and reporting efforts [8]. Additionally, many of the published field tests tend to target homogenous groups of recreationally active young adults [6,12] or adults with a narrow age range [10], with few studies developing and comparing field tests across a broad age range [13,14]. Further, the modalities of these tests may be deemed inappropriate for certain populations, limiting their application to a broad, generalized population. Thus, there is a scientific need to examine the precision and accuracy of easily administered, low cost, submaximal field tests that transcend a wide age range. Accordingly, the purpose of this study was to determine the validity of various walking and stepping tests to predict VO 2max among a broad age-range of adults.

Participants and study overview
This study had a cross-sectional design that spanned three days and two different settings. Day one of testing took place within a university laboratory on a large, midwestern campus. There, participants completed demographic, anthropometric, and VO 2max assessments, using the equipment and techniques outlined under the measures section. Days two and three took place at a separate, on-campus gymnasium with a climate controlled environment and a 200-meter indoor track. These testing days comprised of different walking and stepping exercise tests. One hundred and sixty-two individuals were recruited based on the following inclusion criteria: a.) age between 18-79 years old; b.) ambulatory (i.e. free of any walking limitations, such as use of an assistive device or amputation); c.) able to walk on a treadmill; and d.) healthy as determined by a physical examination within the past three years. Individuals were excluded if they: a.) had a diagnosis of a cardiovascular, metabolic, or pulmonary condition; b.) were pregnant or nursing; and c.) had a history of severe arthritis or other orthopedic conditions. Participants were recruited via telephone, flyers, and word of mouth from a large, metropolitan area and surrounding communities. This study was approved by the University of Wisconsin-Milwaukee Institutional Review Board, #08.298.
Written informed consent from the participants was obtained prior to enrollment to the study.

Measures
Demographic and anthropometric assessment. Participants completed a health history questionnaire that assessed current health status and family health history. Height was measured to the nearest quarter of an inch using a stadiometer (Detecto, Webb City, MO, USA) and weight was measured to the nearest quarter of a pound using a calibrated physician's scale (Detecto, Webb City, MO, USA), with which body mass index (BMI) was calculated. Resting blood pressure and heart rate were assessed using auscultation and palpitation, respectively, following standard procedures [15].
Maximal exercise test. A modified Balke treadmill protocol [1] was used to measure VO 2max . Participants were fitted with a 3-way, non-rebreathing mouthpiece, nose clip, and head support (Hans-Rudolph) that were connected to a metabolic cart using a tube (TrueOne 2400, ParvoMedics, Sandy, UT, USA) to assess expired gas. Measurement of oxygen consumption through expired gasses using this metabolic cart has been previously validated against the traditional Douglas bag method. Specifically, excellent accuracy and precision was reported for gas exchange variables, and VO 2 was found to differ by [0.018] l/min [4]. Heart rate and electrical activity were monitored using a 12-lead EKG (Case System, GE Healthcare, USA). Volitional fatigue or the following criteria had to be met to be considered a maximal exercise test: a plateau <2.1 ml/kg/min between two stages, a respiratory exchange ratio of 1.1 or greater, and a heart rate within 10 bpm of age-predicted maximal heart rate (220-age) [16].
Field tests. During the field tests, participants were fitted with a heart rate monitor (Polar, Polar Electro Inc., Bethpage, NY, USA) to measure recovery heart rate. All tests were separated by a minimum of 5-minutes of seated recovery. Additional time was given to the participant as they deemed it necessary. Heart rate returning back to baseline prior to each new test being started was used as a further marker of sufficient rest being obtained between tests administered. This was consistent for each field test.
Walking tests. Participants completed a series of over-ground walking tests (Table 1). Total distances (m) for single stage tests and individual-stage distance for ramped-intensity, multi-stage tests were measured using a Pittsburgh brand 10,000 ft/m distance wheel. Walking speed (m . s -1 ) was calculated by dividing distance with time and was recorded for single stage tests and individual stages for ramped-intensity protocol tests. Walking speeds were selected for ease of administration.
Depending on the protocol (tests 3-5), participants were instructed to walk at a self-selected slower than normal, normal, and/or faster than normal walking speed. These walking speeds were self-determined. Additionally, the progressive nature of these walking tests emulates traditional graded exercise tests. Recovery heart rate was recorded at 30-second time points for two-minutes after each test.
Step tests. Test duration, stages per test, and stepping cadence were selected to mimic the progressive nature of traditional graded exercise tests ( Table 2).
Step height was selected to mimic traditional step height (e.g. on a flight of stairs) and two different heights were selected to further modify intensity levels. Stepping cadence was assigned based on age (Table 3) with the older age group(s) starting at a lighter intensity than the younger age group(s), to ensure that the test remained submaximal. Recovery heart rate was recorded at 30-second time points for two-minutes after each test.

Statistical analysis
Statistical analysis was completed in SPSS Version 22. Hierarchal regression analysis (using stepwise selection) was used to build models to predict VO 2max . The base model for each equation consisted of age (years), sex (male = 1, female = 0), and body mass (kg), and was entered as the first step of the model. Resting heart rate (bpm) and recovery heart rate (bpm) variables were entered into each model. Walking distance (m) and walking speed (m . s -1 ) were entered into walking test models, and step cadence (bpm) and step height (in) were entered into step test models. For ramped protocol walking tests, individual-stage distance, individual-stage speed, total distance, and average speed were included when building the equations. Variables that significantly predicted VO 2max were kept in the model, while variables that did not significantly predict VO 2max were excluded. Main effects were only considered due to sample size limitations. The resulting model from hierarchical and selection process were tested for multicollinearity using variance inflation factor (VIF). Variables identified with a high VIF (>1.0) were removed from the model. Explained variance (R 2 ), adjusted R 2 (R 2 adj ), and root mean square error (RMSE) were generated for each model.
Each regression equation was then cross-validated using the Jackknife analysis (leave one subject out) method [17] using SAS Version 9.4. Bias and RMSE were created for each test predicting VO 2max. Bland-Altman plots [18] and 95% limits of agreement (LoA, SD of the differences 1.96) were created and a t-test for differences between measured and predicted VO 2max values was assessed. Significance for all tests was set at p<0.05.

Results
Five of the 162 participants recruited did not qualify for the study. Of the final 157 participants, two-thirds of the sample was female (66%) and the average age was 48.9 ± 17.4years (mean ± SD). Average measured VO 2max was 34.3 ± 10.1 ml . kg -1. min -1 and average BMI was 25.7 ± 4.3 kg . m -2 . Participant characteristics broken down by sex are presented in Table 4.

Base model
The base model for each regression equation included age (years), sex (male), and body mass (kg). While the specific values for the base model varied among tests, this model alone accounted for~72% of the explained variance in VO 2max and the RMSE was approximately 5.45 ml . kg -1. min -1 . Age and body mass had a negative relationship with VO 2max meaning that Summary of each step test. Cadence was assigned based on age and stage of test (see Table 3).
Walking regression equations. Walking regression results are presented in Table 5. Gait speed and recovery heart rate were common predictors among the walking equations. Gait speed, when significant, had a positive relationship with VO 2max , where a faster-selected gait speed was associated with a higher VO 2max . For the tests with multiple stages (Test 3-5), slower than usual gait speed was never a significant predictor. Heart rate variables varied among the tests and included 30-or 60-second recovery heart rate. All heart rate variables had a negative relationship with VO 2max .
Stepping regression equations.
Stepping regression results are presented in Table 6. Thirty-second recovery heart rate was a significant predictor for each step test. Like the walking tests, heart rate variables were negatively related to VO 2max . Test 8 performed better than any of the other tests (walking or stepping) for predicting VO 2max (R 2 = 0.835, R 2 adj = 0.830, and RMSE = 4.138 ml . kg -1. min -1 ).

Jackknife validation results
Results of the jackknife validation revealed that bias was relatively small for each test, with each model reporting a bias well within ± 1%. Root mean square error ranged from 4.102 ml . kg -1. min -1 to 4.662 ml . kg -1. min -1 , for Test 8 and Test 1, respectively. Jackknife results are presented in Table 7.
Of the walking tests, the model for Test 2 still accounted for the greatest explained variance in VO 2max with a Jackknife adjusted R 2 of 0.824 and RMSE of 4.287 ml . kg -1. min -1 , and bias of -0.0000421% and 0.0000406%, respectively. Of the stepping tests, the model for Test 8 accounted for the greatest explained variance in VO 2max with a Jackknife adjusted R 2 of 0.834 and RMSE of 4.102 ml . kg -1. min -1 , and bias of -0.0000411% and 0.000104%, respectively. Bland-Altman plots were created for Test 2 (Fig 1) and for Test 8 (Fig 2). Plots show mean error to be close to zero, and LoA of +8.599 to -8.599 ml/kg/min (t-test, -0.000445) for Test 2

Discussion
The purpose of this study was to determine the validity of several easily administered walking and stepping field-tests to predict VO 2max across a broad age range. We found that among all eight tests examined, the 9-minute stepping test with three stages, using an 8-inch step yielded the highest bias-adjusted R 2 (0.834) and lowest RMSE (4.102 ml . kg -1. min -1 ) while maintaining minimal bias, well within ±1%. Overall, the stepping tests outperformed the walking tests for predicting VO 2max by having the highest bias-adjusted R 2 values and lowest RMSE. However, of the walking tests, a single stage, two-minute test to walk as far as possible yielded the highest bias-adjusted R 2 (0.824) and lowest RMSE (4.287 ml . kg -1. min -1 ), also maintaining a minimal bias within ±1%. Three popular field tests that are widely used are the Queen's College Step Test [12], Cooper 12-minute run [9], and the one-mile walk test [10]. The Queen's College Step Test is a 3-minute, single stage step test that requires participants to maintain a cadence of 22 steps/min as they step up and down from a 16.25-inch step and then manually measure and record recovery heart rate [12]. Despite being a single stage test, which makes the test itself shorter, a step height that is close to a foot and a half tall makes this test rigorous and concerns related to balance and fall risk need to be considered. Alternatively, the step tests presented in the current study are 6 and 8-inches tall, which is comparable to a standard step height.
Stepping tests can be difficult to administer at times, as they require the participant to maintain a certain cadence while stepping up and down. Benefits of walking and running tests is that the participant can self-regulate. For example, both the Cooper 12-minute run test and the one-mile walk test instruct participants to cover as much ground within the time frame and walk as quickly as possible to complete the mile, respectively [9,10]. The simplest of the walking tests in the current study was a two-minute test that asked participants to cover as much ground as possible while still maintaining a walk. These simple instructions paired with a short duration make this test very easy to administer and highly achievable for most individuals. Further, as the participants are walking, it is possible to measure the distance as they go, unlike the Cooper 12-minute run where distance can be difficult to gauge depending on the location of the test.
The field tests in the current study performed well when predicting VO 2max , accounting for approximately 80% of the explained variance and yielding RMSE of approximately 4.5 ml . kg -1.
min .-1 . The Queen's College Step Test reports a low R 2 value of 0.563 [12], which accounts for 30% less of the explained variance of VO 2max than our highest performing step test. The Cooper 12-minute run and the one-mile walk test report explained variances for VO 2max of around 77% and 81%, respectively [9,10]. The explained variance for both the one-mile walk and Cooper 12-minute test is similar, albeit lower than the explained variance we report within for our walking tests in the current study. McArdle et al., reports a standard error, however the units are in ml . min -1 , making it difficult to compare error rates among tests [12]. Cooper did not report an error for the 12-minute run estimation equation [9], but the one-mile walk test reported an associated error of 5.0 ml . kg -1. min -1 [10] which is marginally higher than what we report with our current study findings. Error associated with an equation can impact the interpretation of a score. Too large of an error of the estimate can make it difficult to detect true change in a variable (i.e. VO 2max ), and thus smaller error is preferred. Test 5 � : 9-minute walk (3-minute stages), stage 1: < walking speed, stage 2: = walking speed, stage 3: > walking speed � Self-selected walking speeds.
Step Test Key Cross validation analysis showed that our tests yielded minimal bias, meaning that the estimated VO 2max values were very similar to the measured VO 2max values. Unfortunately, there is inconsistency within the literature regarding validation reporting efforts, including the three previously published field tests listed above [9,10,12]. Kline and colleagues did, however, perform a cross-validation analysis in a separate sample and reported a final, adjusted variance of 77% (R 2 = 77.4) and standard error of 4.4 ml . kg -1. min -1 [10]. Although the error is similar to the ones we report here, the explained variance is lower than we found in the current study.
Some considerations are warranted when utilizing any of the field tests we report on. First, when considering feasibility and safety, the 9-minute stepping test, using an 8-inch step might not be appropriate for elderly or frail populations. As there was minimal difference in equation performance between the 9-minute stepping test using a 6-inch step and the 6-minute stepping test using a 6-inch step (~1% in variance and~.1 ml . kg -1. min -1 in error), the shorter duration test with the shorter step could be a safer more practical option. Still, any form of stepping test could still perpetuate the risk for falls. The two-minute over-ground walking test could be the best option for a quick estimation of VO 2max as it requires minimal equipment and is shorter in duration. Additionally, the instructions are simple ("cover as much ground as possible in two-minutes"), whereas the stepping tests require a ramped cadence protocol which could cause confusion. Compared to the stepping tests, the two-minute walking test accounts for a similar amount of variance in VO 2max as the stepping tests (~82%) and contains a similar level of error (~4.2 ml . kg -1. min -1 ).
This study is not without limitations. First, the sample size was relatively small, which limited the analysis to only include main effects. Future studies should aim for a larger sample to allow for the investigation of interactions to potentially strengthen the model(s) to better predict VO 2max . Second, while these models are statistically sound, further investigation into the application of these measures should be investigated. In a clinical setting or as a baseline estimate, any of these tests should be acceptable for estimating VO 2max . The testing environment should also be considered when administering these tests, as they were developed in a climatecontrolled environment. Factors, such as temperature, humidity, and wind could impact test results, thus altering the reliability of the estimation. Further, these models were developed in healthy adults, thus these results are limited to that population. Finally, despite assessing how well our models performed compared to the traditional gold-standard of open circuit spirometry for assessing VO 2max , we did not compare our models to previously validated field test, which may have been a beneficial comparison to make.
In conclusion, this study generated VO 2max estimation equations from eight different stepping and over-ground walking field tests. A jackknife cross-validation assessment followed the creation of each equation to provide information on bias of each equation. By incorporating this bias, which was small, each equation accounted for~80% of the explained variance for predicting VO 2max with an error of~4.5 ml . kg -1. min-1 . These results highlight that reported tests perform well to estimate group mean VO 2max values, but larger error would be expected for a given individual as the Bland-Altman plots display errors of ±8-9 ml . kg -1. min -1 . Compared to previously published field tests, the tests presented here are appropriate for a broad age range and are simple to administer, requiring minimal equipment.